Skip to content

K8SPSMDB-1654: Add vector search support#2360

Open
egegunes wants to merge 4 commits into
mainfrom
K8SPSMDB-1654-implementation
Open

K8SPSMDB-1654: Add vector search support#2360
egegunes wants to merge 4 commits into
mainfrom
K8SPSMDB-1654-implementation

Conversation

@egegunes

@egegunes egegunes commented May 26, 2026

Copy link
Copy Markdown
Contributor

CHANGE DESCRIPTION

This PR is adding vector search (mongot) support into operator based on the proposal in #2332

I won't repeat the same things here that is already in the proposal. The following are the implementation details:

  • Single-replset and sharded clusters are both supported.
  • pkg/psmdb/vectorsearch package contains all code that builds StatefulSet, ConfigMap, Service for vector search.
  • Both mongod and mongos requires special configuration to communicate with mongot instances. We introduce two functions InjectMongodConfig and InjectMongosConfig to inject required parameters into setParameter section without overriding the custom configuration that users already have in their cr.yaml and creates necessary ConfigMap objects.
  • Operator generates the default configuration for mongot and stores in a ConfigMap. If user wants customize mongot options, they can put a partial configuration in search.configuration and operator overrides only customized fields.
  • I needed to use our init container functions to inject the entrypoint for mongot containers. It failed because our init container tries to mount mongod-data volume by default. I decided to remove that mount for crVersion >1.23 because this volume is not used in init containers at all. This caused a lot of diffs in e2e test compare files.
  • This PR adds two e2e tests: vector-search and vector-search-sharded. This tests require IMAGE_MONGOD to point PSMDB v8.3 which we don't have images yet. In my testing I used a custom image: perconalab/percona-server-mongodb:8.3. For vector search, we also don't have images but upstream image works fine: mongodb/mongodb-community-search:latest. Note that once we have images provided by Percona, we might need to change the command we run to start mongot.
  • vector-search-sharded tests logical backups/restores with vector search enabled. PBM has a bug that breaks sharded v8.3 clusters after logical restores. I decided to skip this test until it's resolved.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?
  • Are OpenShift compare files changed for E2E tests (compare/*-oc.yml)?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported MongoDB version?
  • Does the change support oldest and newest supported Kubernetes version?

Copilot AI review requested due to automatic review settings May 26, 2026 11:39
@pull-request-size pull-request-size Bot added the size/XXL 1000+ lines label May 26, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@egegunes egegunes force-pushed the K8SPSMDB-1654-implementation branch from 7183630 to f41e238 Compare May 26, 2026 16:24
@hors hors added this to the v1.23.0 milestone Jun 8, 2026
Copilot AI review requested due to automatic review settings June 9, 2026 09:51
@egegunes egegunes force-pushed the K8SPSMDB-1654-implementation branch from f41e238 to bf080cd Compare June 9, 2026 09:51

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

Copilot AI review requested due to automatic review settings June 9, 2026 15:06

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

@egegunes egegunes force-pushed the K8SPSMDB-1654-implementation branch from 51c6e61 to 8607ca8 Compare June 10, 2026 17:34
Copilot AI review requested due to automatic review settings June 10, 2026 17:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

Copilot AI review requested due to automatic review settings June 11, 2026 07:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

Comment on lines +36 to +37
accessModes:
- ReadWriteOnce

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be set by default in the CheckNSetDefaults the same way we do it for replsets

@JNKPercona

Copy link
Copy Markdown
Collaborator
Test Name Result Time
arbiter passed 00:11:07
balancer passed 00:18:37
cert-management-policy passed 00:08:56
cross-site-sharded passed 00:19:16
custom-replset-name passed 00:10:35
custom-tls passed 00:14:50
custom-users-roles passed 00:10:18
custom-users-roles-sharded passed 00:12:01
data-at-rest-encryption passed 00:13:02
data-sharded passed 00:22:50
demand-backup passed 00:17:06
demand-backup-eks-credentials-irsa passed 00:00:08
demand-backup-fs passed 00:25:23
demand-backup-if-unhealthy passed 00:08:28
demand-backup-incremental-aws passed 00:11:32
demand-backup-incremental-azure passed 00:11:44
demand-backup-incremental-gcp-native passed 00:11:24
demand-backup-incremental-gcp-s3 passed 00:11:23
demand-backup-incremental-minio passed 00:25:52
demand-backup-incremental-sharded-aws passed 00:19:22
demand-backup-incremental-sharded-azure passed 00:17:55
demand-backup-incremental-sharded-gcp-native passed 00:18:00
demand-backup-incremental-sharded-gcp-s3 passed 00:17:16
demand-backup-incremental-sharded-minio passed 00:27:22
demand-backup-logical-minio-native-tls passed 00:08:56
demand-backup-physical-parallel passed 00:08:53
demand-backup-physical-aws passed 00:12:38
demand-backup-physical-azure passed 00:12:55
demand-backup-physical-gcp-s3 passed 00:12:12
demand-backup-physical-gcp-native passed 00:12:09
demand-backup-physical-minio passed 00:21:40
demand-backup-physical-minio-native passed 00:26:34
demand-backup-physical-minio-native-tls passed 00:20:04
demand-backup-physical-sharded-parallel passed 00:11:34
demand-backup-physical-sharded-aws passed 00:18:36
demand-backup-physical-sharded-azure passed 00:17:22
demand-backup-physical-sharded-gcp-native passed 00:17:49
demand-backup-physical-sharded-minio passed 00:17:34
demand-backup-physical-sharded-minio-native passed 00:17:51
demand-backup-sharded passed 00:27:22
demand-backup-snapshot passed 01:23:14
demand-backup-snapshot-vault passed 00:18:18
disabled-auth passed 00:17:15
expose-sharded passed 00:34:16
finalizer passed 00:10:33
ignore-labels-annotations passed 00:07:58
init-deploy passed 00:14:55
ldap passed 00:09:08
ldap-tls passed 00:12:57
limits passed 00:06:19
liveness passed 00:09:05
mongod-major-upgrade passed 00:13:15
mongod-major-upgrade-sharded passed 00:21:28
monitoring-2-0 passed 00:26:26
monitoring-pmm3 passed 00:26:34
multi-cluster-service passed 00:17:13
multi-storage passed 00:20:18
non-voting-and-hidden passed 00:16:58
one-pod passed 00:08:05
operator-self-healing-chaos passed 00:12:53
pitr passed 00:38:04
pitr-physical passed 01:00:06
pitr-sharded passed 00:22:01
pitr-to-new-cluster passed 00:25:44
pitr-physical-backup-source passed 00:54:45
preinit-updates passed 00:05:10
pvc-auto-resize passed 00:12:13
pvc-resize passed 00:17:54
recover-no-primary passed 00:28:03
replset-overrides passed 00:18:40
replset-remapping passed 00:17:24
replset-remapping-sharded passed 00:18:23
rs-shard-migration passed 00:14:54
scaling passed 00:11:21
scheduled-backup passed 00:18:37
security-context passed 00:07:14
self-healing-chaos passed 00:15:24
service-per-pod passed 00:19:00
serviceless-external-nodes passed 00:07:27
smart-update passed 00:08:24
split-horizon passed 00:14:00
split-horizon-manual-tls passed 00:12:16
stable-resource-version passed 00:04:50
storage passed 00:07:27
tls-issue-cert-manager passed 00:31:00
unsafe-psa passed 00:07:43
upgrade passed 00:10:32
upgrade-consistency passed 00:08:22
upgrade-consistency-sharded-tls passed 00:59:18
upgrade-sharded passed 00:20:10
upgrade-partial-backup passed 00:17:07
users passed 00:18:17
users-vault passed 00:13:28
vector-search passed 00:00:08
vector-search-sharded passed 00:00:07
version-service passed 00:25:38
Summary Value
Tests Run 96/96
Job Duration 03:07:35
Total Test Time 28:07:38

commit: 3171494
image: perconalab/percona-server-mongodb-operator:PR-2360-317149465

// unmarshals into the existing struct in place, so only fields that
// appear in spec.Configuration are overridden — every other default
// is preserved.
func userMongotConfig(cr *api.PerconaServerMongoDB, rs *api.ReplsetSpec) (mongot.Config, error) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add unit test here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got, err := userMongotConfig(cr, rs)

// SearchHost returns the fully qualified address mongod uses to reach
// this replset's mongot —
// `<cluster>-<rs>-search-0.<cluster>-<rs>-search.<ns>.<dnsSuffix>:27028`.
func SearchHost(cr *api.PerconaServerMongoDB, rs *api.ReplsetSpec) string {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is called in the same package, do we need to export it?

// containers are ready, AppStateStopping / AppStatePaused when the
// cluster is paused. NotFound on the StatefulSet (creation in flight)
// also returns AppStateInit.
func (r *ReconcilePerconaServerMongoDB) searchStatus(ctx context.Context, cr *api.PerconaServerMongoDB, rs *api.ReplsetSpec) (api.SearchStatus, error) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a unit test for this using fake client?

}
}

if cr.Spec.Search != nil && cr.Spec.Search.Enabled {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if cr.Spec.Search != nil && cr.Spec.Search.Enabled {
if cr.IsSearchEnabled() {

toDelete = append(toDelete, naming.ArbiterStatefulSetName(cluster, rs))
}
if cluster.IsSearchEnabled() {
toDelete = append(toDelete, naming.SearchStatefulSetName(cluster, rs))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work? I think we might be trying to delete one sts for configsvr replset too (which does not exist). We might need the rs.ClusterRole == api.ClusterRoleConfigSvr check here too

annotations = map[string]string{}
}
if configHash != "" {
annotations["percona.com/configuration-hash"] = configHash

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need the ssl hashes too? Not sure how it picks up SSL cert changes

}

if rs == nil || rs.Search == nil {
return cr.Spec.Search

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to return DeepCopy() here as well? I think we might be updating a shared object on the caller side when no overrides are specified

Comment on lines +81 to +89
if err := r.client.Delete(ctx, sts); err != nil && !k8serrors.IsNotFound(err) {
return errors.Wrapf(err, "delete StatefulSet %s", sts.Name)
}
if err := r.client.Delete(ctx, svc); err != nil && !k8serrors.IsNotFound(err) {
return errors.Wrapf(err, "delete Service %s", svc.Name)
}
if err := r.client.Delete(ctx, cm); err != nil && !k8serrors.IsNotFound(err) {
return errors.Wrapf(err, "delete ConfigMap %s", cm.Name)
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will help with the bilnd deletes - #2389

// StatefulSet returns the StatefulSet object that runs the mongot
// group for the given replset. configHash is set as a pod-template
// annotation so a ConfigMap content change rolls the pods.
func StatefulSet(cr *api.PerconaServerMongoDB, rs *api.ReplsetSpec, initImage, configHash string) *appsv1.StatefulSet {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A unit test for this would be nice as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants